Scalable Feature Mining for Sequential Data

نویسندگان

  • Neal Lesh
  • Mohammed J. Zaki
  • Mitsunori Ogihara
چکیده

Classification algorithms are difficult to apply to sequential examples, such as text or DNA sequences, because there is a vast number of potentially useful features for describing each example. Past work on feature selection has focused on searching the space of all subsets of the available features which is intractable for large feature sets. We adapt data mining techniques to act as a preprocessor to select features for standard classification algorithms such as Naive Bayes and Winnow. We apply our algorithm to a number of datasets, and experimentally show that the features produced by our algorithm improve classification accuracy by up to 20%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequential Sequence Mining Technique in Mammographic Information Analysis Database

The Sequential Sequence Mining produces large sequences of biomedical data. It provides the opportunities for data analysis and knowledge discovery. The Sequential Sequence Mining provides the efficient and scalable methods to extract the sequences of interest in datasets. The synthetic dataset was taken of the medical images of mammography. The Sequential Sequence Mining technique motivates us...

متن کامل

Scalable Feature Selection for Large Sized Databases

Feature selection determines relevant features in the data. It is often applied in pattern classiica-tion. A special constraint for feature selection nowadays is that the size of a database is normally very large. An eeective method is needed to accommodate the practical demands. A scalable probabilistic algorithm is presented here as an alternative to the exhaustive and heuristics approaches. ...

متن کامل

Scalable Data Mining for Rules

Data Mining is the process of automatic extraction of novel, useful, and understandable patterns in very large databases. High-performance scalable and parallel computing is crucial for ensuring system scalability and interactivity as datasets grow inexorably in size and complexity. This thesis deals with both the algorithmic and systems aspects of scalable and parallel data mining algorithms a...

متن کامل

Discovering and Mining User Web-page Traversal Patterns

As the popularity of WWW explodes, a massive amount of data is gathered by Web servers in the form of Web access logs. This is a rich source of information for understanding Web user surfing behavior. Web Usage Mining, also known as Web Log Mining, is an application of data mining algorithms to Web access logs to find trends and regularities in Web users' traversal patterns. The results of Web ...

متن کامل

A Geometric View of Similarity Measures in Data Mining

The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Intelligent Systems

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2000